Selection of clusters number and features subset during a two-levels clustering task
نویسندگان
چکیده
Simultaneous selection of the number of clusters and of a relevant subset of features is part of data mining challenges. A new approach is proposed to address this difficult issue. It takes benefits of both two-levels clustering approaches and wrapper features selection algorithms. On the one hands, the former enhances the robustness to outliers and to reduce the running time of the algorithm. On the other hands, wrapper features selection (FS) approaches are known to given better results than filter FS methods because the algorithm that uses the data is taken into account. First, a Self-Organizing Maps (SOM), trained using the original data sets, is clustered using k-means and the Davies-Bouldin index to determinate the best number of a clusters. Then, an individual pertinence measure guides the backward elimination procedure and the feature mutual pertinence is measure using a collective pertinence based on the quality of the clustering.
منابع مشابه
MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملThe ensemble clustering with maximize diversity using evolutionary optimization algorithms
Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملUsing Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering
Abstract. One of the most difficult problems in clustering, the task of grouping similar instances in a dataset, is automatically determining the number of clusters that should be created. When a dataset has a large number of attributes (features), this task becomes even more difficult due to the relationship between the number of features and the number of clusters produced. One method of addr...
متن کاملFeature Selection for Clustering
Clustering is an important data mining task Data mining often concerns large and high dimensional data but unfortunately most of the clustering algorithms in the literature are sensitive to largeness or high dimensionality or both Di erent features a ect clusters di erently some are important for clusters while others may hinder the clustering task An e cient way of handling it is by selecting ...
متن کامل